Task: plot the first five (modified) Legendre polynomials. Verify, numerically, that they are orthonormal.
Legendre basis are thought in order to create a countable base for Hilbert space, for functions defined in [-1, +1]:
\[P_0(x) = 1, P_1(x) = x, P_2(x) = \frac{1}{2} (3x^2-1),\, . \, .\, ., Pj (x) = \frac{1}{2^j j!} \frac{d^j}{dx^j}(x^2-1)^j \, . \, .\, . \]
Since computing j-th derivate of \((x^2-1)^j\) would return a very long expression, it’s usefull to express these polynomials explicitly through a recursive relation:
\[P_{j+1}(x) = \frac{(2j + 1) x P_j (x) - j P_{j-1}(x)}{j + 1}\]
legendre=function(x,j) {
if(j==0) {
return(1)
}
if(j==1) {
return(x)
}
else return(
(((2*j-1)*x*legendre(x,j-1))-(j-1)*legendre(x,j-2))/j)
}
These polynomials are orthogonal…
\[\int_{-1}^{+1}P_{j_1}(x)P_{j_2}(x)dx =0 \quad \forall \: j\]
a=integrate( function(x) legendre(x,5)*legendre(x,3), lower=-1,upper=1)$value
b=integrate( function(x) legendre(x,5)*legendre(x,3), lower=-1,upper=1)$value
e=integrate( function(x) legendre(x,5)*legendre(x,3), lower=-1,upper=1)$value
c(a,b,e)
[1] 6.245005e-17 6.245005e-17 6.245005e-17
…but not normal, since:
\[\int_{-1}^{+1}P^2_j(x)dx =\frac{2}{2j+1}\]
Actually:
\(\int_{-1}^{+1}P^2_1(x)dx =\) 0.6666667 \(\int_{-1}^{+1}P^2_2(x)dx =\) 0.4 \(\int_{-1}^{+1}P^2_3(x)dx =\) 0.2857143 \(\int_{-1}^{+1}P^2_4(x)dx =\) 0.2222222
However, we can normalize Legendre polynomials:
\[Q_j(x) = \sqrt{\frac{(2j + 1)}{2}} P_j (x)\]
q_x=function(x,j) {
return(legendre(x,j)*sqrt((2*j+1)/2))
}
… finally those will realize an Orthonormal basis for \(L_2([-1, +1])\). where \(L_2\) is defined as: \[L_2([a, b]) = \{g : [a, b] \mapsto \mathbb{R} \:s.\: t. \left \| g \right \|_2 = \int_a^b\left | g(x)^2 \right | dx < \infty\}\]
indeed:
\(\int_{-1}^{+1}P^2_1(x)dx =\) 1 \(\int_{-1}^{+1}P^2_2(x)dx =\) 1 \(\int_{-1}^{+1}P_1(x)P_4(x)dx =\) 5.32524e-17
We could say that averything seems ok… except for a tiny little silly thing: It’s not going to work! Indeed the recursive function we have defined before, has two parallel loops ( e.g. for the j-th base it needs to go through, almost, \(j^2\) levels ) so for the sake of our PC we should parellelize calculations assigning them to different threads/core, or caching the function … or just use the gsl library…
library(gsl)
leg.basis2 <- function(j,x) sqrt((2*j + 1)/2)*legendre_Pl(j, x)
…Basis are ready, now let’s set up the target function: Doppler function rescaled to [-1,1]:
\[g(x) =\sqrt{(x(1-x))}sin(\frac{2.1\pi}{x+0.05})\] \[h(x)= g(0.5(x+1))\]
Evaluate the Fourier coefficients of the Doppler under our Cosine-basis:
\[\beta_{j}=\left \langle g,\phi_j \right \rangle_{L_2}= \int_{-1}^{+1}g(x)\phi_j(x)dx\]
First 10 coefficients:
[1] 0.0684015246 0.1007691143 0.0643306763 -0.0424656443 -0.1369864121
[6] -0.0949871788 0.0650198686 0.1341083357 0.0008080226 -0.1207487365
\newpage
Approximation for different n-terms approximations J={5, 10, 25, 50, 100, 150} as follow:
\[g_{J}(x)=\sum_{j=0}^{J-1}\beta_j\phi_j(x)\]
and evaluating the error:
\[Error= \left \| h(x)-h_{J}(x) \right \|^2_{L_2}= \int_{-1}^{+1}(h(x)-h_{J}(x))^2dx\]
A little improvement can be seen if we select the largest coefficient, in order to use only the more relevant function!
\[g_{J}^*(x)=\sum_{j\in \Lambda_{J}}\beta_j\phi_j(x)\]
Just to check that everything is working, notice that linear & non linear on the first 200 terms recover the same error value!
\newpageTask: Define all the functions needed to build the approximant function, learning, along the way, how to numerically evaluate double integrals.
We can build the Tensor basis using the definition provided:
\[\phi_{j1,j2}(x_1,x_2)=\phi_{j1}(x_1)\phi_{j2}(x_2) \enspace j1,j2=0,1...\]
for the cosine basis defined in previous labs
\[\phi_{0,0}=1 \enspace \phi_{j1,j2}=\sqrt{2}cos(j_1\pi x)\sqrt{2}cos(j_2 \pi x)\]
tensor.cosbasis = function(x, j1,j2){
(1*(j1 == 0) + sqrt(2)*cos(pi*j1*x[1])*(j1 > 0))*
(1*(j2 == 0) + sqrt(2)*cos(pi*j2*x[2])*(j2 > 0))
}
We managed to evaluate the performance of double integrals libraries in order to speed up the computation of forthcoming steps. We tried two libraries: cubature and R2Cuba. The second library performed better. Thanks to the double integral library we can check the normality and orthogonality of our basis:
library(cubature)
library(R2Cuba)
## Check normality
adaptIntegrate(function(x) tensor.cosbasis(x,1,2)^2, lowerLimit = c(0,0),
upperLimit = c(1,1),maxEval = 50000)
$integral
[1] 1
$error
[1] 1.526557e-16
$functionEvaluations
[1] 255
$returnCode
[1] 0
cuhre(2,1,function(x) tensor.cosbasis(x,1,2)^2,lower = c(0,0),upper = c(1,1))
Iteration 1: 65 integrand evaluations so far
[1] 1.00007 +- 1.02672 chisq 0 (0 df)
Iteration 2: 195 integrand evaluations so far
[1] 0.999991 +- 0.186743 chisq 5.24566e-009 (1 df)
Iteration 3: 325 integrand evaluations so far
[1] 0.999995 +- 0.0933761 chisq 5.35999e-009 (2 df)
Iteration 4: 455 integrand evaluations so far
[1] 1 +- 9.50535e-006 chisq 9.0804e-009 (3 df)
integral: 1 (+-9.5e-06)
nregions: 4; number of evaluations: 455; probability: 2.301314e-13
## Check othogonality
prod.base.base <- function(x,j1,j2) {
tensor.cosbasis(x,j1[1],j1[2])*tensor.cosbasis(x,j2[1],j2[2])
}
adaptIntegrate(function(x) prod.base.base(x,c(1,2),c(1,2)), lowerLimit = c(0,0),
upperLimit = c(1,1),maxEval = 50000)
$integral
[1] 1
$error
[1] 1.526557e-16
$functionEvaluations
[1] 255
$returnCode
[1] 0
cuhre(2,1,function(x) prod.base.base(x,c(1,2),c(1,2)),lower = c(0,0),upper = c(1,1))
Iteration 1: 65 integrand evaluations so far
[1] 1.00007 +- 1.02672 chisq 0 (0 df)
Iteration 2: 195 integrand evaluations so far
[1] 0.999991 +- 0.186743 chisq 5.24566e-009 (1 df)
Iteration 3: 325 integrand evaluations so far
[1] 0.999995 +- 0.0933761 chisq 5.35999e-009 (2 df)
Iteration 4: 455 integrand evaluations so far
[1] 1 +- 9.50535e-006 chisq 9.0804e-009 (3 df)
integral: 1 (+-9.5e-06)
nregions: 4; number of evaluations: 455; probability: 2.301314e-13
Now that we built our basis we can compute the Fourier Coefficients defined as:
\[\beta_{j1,j2}=\left \langle g(x_1,x_2),\phi_{j1,j2}(x_1,x_2) \right \rangle_{L_2}= \int_{0}^{1}\int_{0}^{1}g(x_1,x_2),\phi_{j1,j2}(x_1,x_2)dx_1dx_2\]
g_x=function(x) {
x[1]+cos(x[2])
}
j1.max <- 50
j2.max <- 50
f.coeff <- matrix(NA, j1.max, j2.max)
for (idx1 in 0:(j1.max-1)){
for (idx2 in 0:(j2.max-1)){
coeff = tryCatch(
cuhre(2,1,function(x) prod.base.g(x,idx1,idx2),lower = c(0,0),upper = c(1,1),
flags = list(verbose=0))$value,
error = function(e) NA
)
f.coeff[idx1+1,idx2+1] = coeff
}
}
It’s a long computation for the PC, so we have computed once and stored them into a file: ‘fcoeff.txt’. So you need just to read.table(‘fcoeff.txt’) them. Furthermore thanks to our i7 Mac processor, we didn’t need to set a max number of evaluation for the cuhre function :)
Now in order to proceed, instead of evalueate the function in just three combinations of J1 and J2, we decided to observe the risk function behaviour in 4 blocks cases, first two with just any coefficents, second two with more coefficients, and then two particulat case: just 2 and all of them . Per each block we set the total amount of coefficients to use, and then try different combination ( j1
j1 j2 err
1 1 5 8.338556e-02
2 2 4 1.315214e-03
3 3 3 1.498807e-03
4 4 2 1.441359e-03
5 5 1 1.944249e-02
6 2 6 1.234234e-03
7 3 5 1.257698e-03
8 4 4 3.012896e-04
9 5 3 4.848833e-04
10 6 2 1.304208e-03
11 3 7 1.221834e-03
12 4 6 2.200715e-04
13 5 5 2.437819e-04
14 6 4 1.633919e-04
15 7 3 3.476281e-04
16 20 40 1.830399e-06
17 30 30 7.114881e-07
18 40 20 9.250931e-07
19 1 1 1.025843e-01
20 50 50 1.004057e-04
library(plot3D)
library(rgl)
library(rglwidget)
#Define sequences of X and Y in order to evaluate the function
X <- seq(0, 1, length = 100)
Y <- X
count=length(X)
# Z is a matrix of the results of the function evaluated in X and Y
Z=matrix(nrow=count,ncol=count)
for (i in 1:count) {
for (k in 1:count) {
Z[i,k]=g_x(c(X[i],Y[k]))
}
}
Z[is.na(Z)] <- 1
# We open the 3d plot environment
open3d()
wgl
1
bg3d("white")
material3d(col = "black")
# Plot the function
persp3d(X,Y,Z, col = "lightgrey",
xlab = "X", ylab = "Y", zlab = "g(x)")
X1 <- seq(0, 1, length = 100)
Y1 <- X1
count2=length(X1)
Z1=matrix(nrow=count2,ncol=count2)
for (i in 1:count2) {
for (k in 1:count2) {
Z1[i,k]=predict(c(X1[i],Y1[k]),5,5)
}
}
bg3d("white")
nbcol = 100
color = rev(rainbow(nbcol, start = 0/6, end = 4/6))
zcol = cut(Z1, nbcol)
persp3d(X1, Y1, Z1, col = color[zcol],
xlab = "X", ylab = "Y", zlab = "prediction",add=T)
You must enable Javascript to view this page properly.
library(plot3D)
library(rgl)
library(rglwidget)
#Define sequences of X and Y in order to evaluate the function
X <- seq(0, 1, length = 100)
Y <- X
count=length(X)
# Z is a matrix of the results of the function evaluated in X and Y
Z=matrix(nrow=count,ncol=count)
for (i in 1:count) {
for (k in 1:count) {
Z[i,k]=g_x(c(X[i],Y[k]))
}
}
Z[is.na(Z)] <- 1
# We open the 3d plot environment
open3d()
wgl
3
bg3d("white")
material3d(col = "black")
# Plot the function
persp3d(X,Y,Z, col = "lightgrey",
xlab = "X", ylab = "Y", zlab = "g(x)")
X1 <- seq(0, 1, length = 100)
Y1 <- X1
count2=length(X1)
Z1=matrix(nrow=count2,ncol=count2)
for (i in 1:count2) {
for (k in 1:count2) {
Z1[i,k]=predict(c(X1[i],Y1[k]),7,3)
}
}
bg3d("white")
nbcol = 100
color = rev(rainbow(nbcol, start = 0/6, end = 4/6))
zcol = cut(Z1, nbcol)
persp3d(X1, Y1, Z1, col = color[zcol],
xlab = "X", ylab = "Y", zlab = "prediction",add=T)
You must enable Javascript to view this page properly.
library(plot3D)
library(rgl)
library(rglwidget)
#Define sequences of X and Y in order to evaluate the function
X <- seq(0, 1, length = 100)
Y <- X
count=length(X)
# Z is a matrix of the results of the function evaluated in X and Y
Z=matrix(nrow=count,ncol=count)
for (i in 1:count) {
for (k in 1:count) {
Z[i,k]=g_x(c(X[i],Y[k]))
}
}
Z[is.na(Z)] <- 1
# We open the 3d plot environment
open3d()
wgl
5
bg3d("white")
material3d(col = "black")
# Plot the function
persp3d(X,Y,Z, col = "lightgrey",
xlab = "X", ylab = "Y", zlab = "g(x)")
X1 <- seq(0, 1, length = 100)
Y1 <- X1
count2=length(X1)
Z1=matrix(nrow=count2,ncol=count2)
for (i in 1:count2) {
for (k in 1:count2) {
Z1[i,k]=predict(c(X1[i],Y1[k]),37,37)
}
}
bg3d("white")
nbcol = 100
color = rev(rainbow(nbcol, start = 0/6, end = 4/6))
zcol = cut(Z1, nbcol)
persp3d(X1, Y1, Z1, col = color[zcol],
xlab = "X", ylab = "Y", zlab = "prediction",add=T)
You must enable Javascript to view this page properly.
\newpage
Lets take a look to the correlation between the covariates
X.cor <- cor(dat[,1:268])
mean(abs(X.cor))
[1] 0.5309039
…very high (average) correlation, in particular between values observed at nearby frequencies (…quite naturally…). We gonna have some problems here…
So in conclusion, first 6 graph showed us that values os NIR are related one to each other for close freq… but also (last one) the information related to some frequency is redoundant, because the curves are often parallel.. so we should drop a lot of them
#Fitting the model on the train test
m<-lm(y~.,data=pet.train)
#Summaries of the model
head(m$coefficients,28)
(Intercept) X.1 X.2 X.3 X.4 X.5
549.8032 -332.3468 1004.7647 -2073.9778 4109.1016 -8304.2185
X.6 X.7 X.8 X.9 X.10 X.11
13202.7075 -18149.9587 29237.7435 -41178.8220 42563.9510 -35747.4546
X.12 X.13 X.14 X.15 X.16 X.17
19386.5545 -1308.2303 -9830.0209 14586.6114 -20267.4051 27405.1965
X.18 X.19 X.20 X.21 X.22 X.23
-16962.1355 -1887.8544 4341.8303 NA NA NA
X.24 X.25 X.26 X.27
NA NA NA NA
We are encountering the small (n) - large (p) problem where the parameters,i.e., the coefficients of the independent variables are way larger than the sample size. This affects the degrees of freedom of the residuals and thus we are getting NA for the most estimates and for the statistics (F statistic since we are getting negative degrees of freedom)
Since there’s a different in the scale of values between the variates and the covariates we decided to scale the \(\mathbb{X}\)
######
ytr<-pet.train[ , 269]
Xtr = pet.train[ , -c(269,270)]
yte = pet.test[ , 269]
Xte = pet.test[ , -c(269,270)]
#Standardising the independent variables
Ztr = scale(Xtr)
Zte = scale(Xte)
fwd.reg <- function(y, X, k.max = ncol(X), stand = TRUE){
# Standardize if asked
if (stand) X <- scale(X)
# Initialize variable sets & other quantities
S = NULL
# active set
U = 1:ncol(X)
#inactive set
k.loc = 0
# current active set dim
ee = y
#Loop
while (k.loc < k.max){
## Update loop-index
k.loc = k.loc + 1
## Step 1: Evaluate correlation with inactive variables
rr = abs( cor(X[,U], ee) )
## Step 2: Extract the most correlated variable & update the sets
J = U[which.max(rr)]
S = c(S, J)
U = U[-which.max(rr)]
## Step 3: Regress on active var's and get residuals
ee = resid( lm(y ~ X[,S]) ) }
# Output
return(S)
}
S<-fwd.reg(ytr,Xtr)
S
[1] 40 241 1 150 90 59 17 77 56 57 65 4 66 61 63 2 54
[18] 68 67 60
Now we have an order of the most correlated independent variables with the dependent. The rest were not chosen because of independence, indeed we have selected the covariates that in each step have some relation with the residuals of the model, so in each step we add relevant information that can lower the error!!! Infact having a similar trend with the residuals will plug into the madel the variable concerned. Although it depends on the previous choice, infact we cannot claim that this is the best model, but a local best model, maybe another set of variables could do better but how could we find it? Infact in each step we go to the local minimization of the error, looking at the relation with the residuals..
# Initialize the vectors of scores
model.score <- matrix(NA, length(S),4)
#y hat
m=lm(ytr ~ Ztr)
yhat <- predict(m, data.frame(Ztr) )
train.error=ytr-yhat
library(MASS)
for (idx in 1:length(S)){
# Build the data.frame with the FWD-selected variables
ztr <- Ztr[, S[1:idx], drop = FALSE]
zte <- Zte[, S[1:idx], drop = FALSE]
xtr <- Xtr[, S[1:idx], drop = FALSE]
xtr=data.matrix(xtr)
####Train-Split Schema
m=lm(ytr ~ ztr)
yhat <- predict(m, newdata=list(ztr=zte ) )
MSE=mean((yte - yhat)^2)
#### Leave One Out
yhat <- predict(m, newdata=list(ztr=ztr ) )
hat=function(x) diag(x%*%ginv(t(x)%*%x)%*%t(x))
R.LOO=1/length(ytr)*sum(((ytr - yhat)/(1-hat(ztr)))^2)
#### Generalized Cross-Validation
R.GCV= 1/20*sum((ytr - yhat)^2)/((1-(idx/length(train.error)))^2)
#Cross Validation
nfold<-5
rarr<-sample(1:nrow(ztr))
rarr
fold.str<-matrix(rarr, nrow = nfold, ncol = round(nrow(ztr)/nfold))
fold.str
######## NB sono 21, fold da 5 ---> ne scarti uno (la 20 riga nel nostro caso)
score = c()
for (i in 1:nfold) {
test.idx<-fold.str[i,]
test.loc<-ztr[test.idx,]
test.y=ytr[test.idx]
train.idx = c(fold.str[-i,])
train.loc = ztr[train.idx,]
train.y=ytr[train.idx]
m.loc = lm(train.y ~ train.loc)
y.hat = predict(m.loc, newdata=list(train.loc=test.loc))
score[i] = mean(y.hat - test.y)^2
}
cv.score = mean(score)
model.score[idx,] = c(MSE,cv.score,R.LOO,R.GCV)
}
Optimal number of cov.
Train-test split Scheme 3
5-fold CV 12
Leave-1-out 20
Generalized CV 20
RMSE
k=3 Tr-Te 14.10791
k=12 5-CV 15.70449
k=20 LOO 15.04671
k=20 GCV 15.04671
Generally we used the best 20 correlated variables. That was in order to conduct the regression by keeping the degrees of freedom positive. The optimal number of variables that is choosen depends on wich criteria we use to evaluate the model… In the end the best model achieved with this algorithm is the one with X.40, X.241 and X.1. We cannot claim this is the best model… Infact there have been a lot of incongruences in our procedure, for example extimate parameters and compute the error on the same data.